imputation in missing not at random snps data using em algorithm
نویسندگان
چکیده
the relation between single nucleotide polymorphisms (snps) and some diseases has been concerned by many researchers. also the missing snps are quite common in genetic association studies. hence, this article investigates the relation between existing snps in dnmt1 of human chromosome 19 with colorectal cancer. this article aims is to presents an imputation method for missing snps not at random. in this case-control study, 100 patients suffering from colorectal cancer consulting with the research institute for gastroenterology and liver disease of shahid beheshti university of medical sciences were considered as the case group and 100 other patients consulting with the same research institute were considered as the control group and the genetic test was applied in order to identify the genotype of the 6 snps of the dnmt1 of chromosom 19 for all the patients under investigation. the obtained data were analyzed using logistic regression, then a fraction of the data was eliminated both at random and not at random and the imputation was done through the em algorithm and the logistic regression coefficients variation before and after the imputation was compared. the results of this study implied that in both methods, at random and not at random missing snps, the estimation of the logistic regression coefficients after the imputation through em algorithm has a greater correspondence to the results obtained from the complete data in comparison with the method of eliminating the missing values.
منابع مشابه
Missing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملMissing value imputation on missing completely at random data using multilayer perceptrons
Data mining is based on data files which usually contain errors in the form of missing values. This paper focuses on a methodological framework for the development of an automated data imputation model based on artificial neural networks. Fifteen real and simulated data sets are exposed to a perturbation experiment, based on the random generation of missing values. These data set sizes range fr...
متن کاملSimple imputation methods were inadequate for missing not at random (MNAR) quality of life data
OBJECTIVE QoL data were routinely collected in a randomised controlled trial (RCT), which employed a reminder system, retrieving about 50% of data originally missing. The objective was to use this unique feature to evaluate possible missingness mechanisms and to assess the accuracy of simple imputation methods. METHODS Those patients responding after reminder were regarded as providing missin...
متن کاملImputation methods for quantile estimation under missing at random
Imputation is frequently used to handle missing data for which multiple imputation is a popular technique. We propose a fractional hot deck imputation which produces a valid variance estimator for quantiles. In the proposed method, the imputed values are chosen from the set of respondents and are assigned with proper fractional weights that use a density function for the working model. In addit...
متن کاملImputation of Missing Values for Unsupervised Data Using the Proximity in Random Forests
This paper presents a new procedure that imputes missing values by random forests for unsupervised data. We found that it works pretty well compared with k-nearest neighbor (kNN) and rough imputations replacing the median of the variables. Moreover, this procedure can be expanded to semisupervised data sets. The rate of the correct classification is higher than that of other conventional method...
متن کاملMultiple Imputation for Missing Data
Multiple imputation provides a useful strategy for dealing with data sets with missing values. Instead of filling in a single value for each missing value, Rubin’s (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard proc...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
journal of paramedical sciencesجلد ۲، شماره ۳، صفحات ۰-۰
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023